1. Import Data

Data Description

The dataset is about unemployment data in the State of NY, and it contains monthly information. This project working on forecasting unemployment which helps understand the economic health of a region.

library(readr)
library(fpp2)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## ── Attaching packages ────────────────────────────────────────────── fpp2 2.5 ──
## ✔ ggplot2   4.0.0      ✔ fma       2.5   
## ✔ forecast  8.24.0     ✔ expsmooth 2.3
## 
library(TTR)
raw_data_csv <- read_csv("Data_NewYorkEdited.csv")
## Rows: 564 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): State
## dbl (8): Code, Year, Month, NIP, Population, EmployedLF, Unemployment, Unemp...
## num (2): LF, Employment
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(raw_data_csv)
## # A tibble: 6 × 11
##    Code State          Year Month    NIP     LF Population Employment EmployedLF
##   <dbl> <chr>         <dbl> <dbl>  <dbl>  <dbl>      <dbl>      <dbl>      <dbl>
## 1 51000 New York city  1976     1 5.69e6 3.07e6       53.9    2723016       47.8
## 2 51000 New York city  1976     2 5.69e6 3.07e6       53.9    2722421       47.8
## 3 51000 New York city  1976     3 5.69e6 3.06e6       53.9    2722931       47.9
## 4 51000 New York city  1976     4 5.69e6 3.07e6       53.9    2726299       47.9
## 5 51000 New York city  1976     5 5.69e6 3.07e6       54      2730681       48  
## 6 51000 New York city  1976     6 5.68e6 3.08e6       54.1    2735049       48.1
## # ℹ 2 more variables: Unemployment <dbl>, UnemployedLF <dbl>
spec(raw_data_csv)
## cols(
##   Code = col_double(),
##   State = col_character(),
##   Year = col_double(),
##   Month = col_double(),
##   NIP = col_double(),
##   LF = col_number(),
##   Population = col_double(),
##   Employment = col_number(),
##   EmployedLF = col_double(),
##   Unemployment = col_double(),
##   UnemployedLF = col_double()
## )
ts_unemployed <- ts(raw_data_csv$UnemployedLF, start=c(1976,1), end = c(2019,12), frequency=12)

head(ts_unemployed)
##       Jan  Feb  Mar  Apr  May  Jun
## 1976 11.2 11.2 11.2 11.1 11.1 11.1
tail(ts_unemployed)
##      Jul Aug Sep Oct Nov Dec
## 2019 3.8 3.8 3.8 3.8 3.9 4.0

2. Data Overview

2.1 Time Series Plot

plot(ts_unemployed, ylab = "UnemployedLF", col= "red")

Acf(ts_unemployed)

2.2 Summary of time series plot Observations

The time series of UnmploymentLF from Jan 1976 to Dec 2019 shows several key characteristics:

  • trend: A gradual downward long-term trend, with unemployment levels decreasing toward the end of the series.

  • Strong seasonality: No clear seasonality is observed. Instead of seasonal repetition, the series exhibits multi-year cyclical movements.

  • Increasing variability:Variability is not increasing over time; fluctuations remain relatively consistent, though cycles differ in amplitude across decades.

Overall, the series displays pronounced long-term cycles and strong persistence, with slow ACF decay indicating long-range dependence rather than seasonal structure.

2.3 Remove data plot and ACF plot

To generate short-term forecasts (1–2 years), a rolling window of recent data is preferred because the unemployment rate experienced structural changes over the full 40-year span. The period after 2010 is much more stable, with consistent downward movement and lower volatility. Using this window improves model stability and prevents earlier high-variance periods from distorting short-term forecasts.

unemployed_ts <- window(ts_unemployed, start = c(2010,1))

plot(unemployed_ts, main = "Unemployment Level in Labor Force")

Acf(unemployed_ts)

2.4 Summary of ACF Plot Observations

The ACF shows very strong and persistent autocorrelation that decays slowly, indicating a non-stationary series with a strong underlying trend. There is no visible seasonal pattern, as correlations do not show repeating peaks at regular lags.

2.5 Central Tendency

# Display summary statistics
summary(unemployed_ts)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.800   4.500   6.250   6.737   9.100  10.100
# Create a box plot for visualizing distribution
boxplot(unemployed_ts,
        main = "Boxplot of Unemployment (2010–2019)",
        ylab = "unemployment(Percentage)",
        col = "lightblue",
        border = "darkblue")

2.6 Summary of Statistics and Box Plot Observations

The unemployment rates from 2010–2019 range between 3.8 and 10.1, with a median around 6.25, indicating that half of the values fall between 4.5 and 9.1. The Boxplot shows a wide IQR, suggesting substantial variability across the decade, and no extreme outliers. Overall, the distribution indicates that unemployment was generally moderate but fluctuated significantly, consistent with the downward trend seen in the time series.

3. Accuracy Evaluation Criteria

For this dataset, MAPE (Mean Absolute Percentage Error) is selected as the primary accuracy metric. Since the goal is to forecast future unemployment rate and communicate performance in interpretable terms, MAPE provides a clear percentage-based measure of average prediction error.

4. Models

4.1 Decomposition

#decomposition
stl_decomp <- stl(unemployed_ts, s.window = "periodic")
plot(stl_decomp)

attributes(stl_decomp)
## $names
## [1] "time.series" "weights"     "call"        "win"         "deg"        
## [6] "jump"        "inner"       "outer"      
## 
## $class
## [1] "stl"
  • This additive decomposition shows a very strong and regular seasonal pattern, repeating every year with consitent peaks and troughs.

  • Seasonal values range about +1 to –1, and High in Jan–Mar, low in Jun–Aug. May caused by early-year layoffs raise unemployment; summer hiring lowers it.

# seasonal adjustment
seasadj(stl_decomp)
##            Jan       Feb       Mar       Apr       May       Jun       Jul
## 2010 10.069272  9.970443  9.891614  9.803698  9.725782  9.616041  9.526300
## 2011  8.969272  8.970443  8.891614  8.903698  8.925782  9.016041  9.126300
## 2012  9.469272  9.570443  9.591614  9.603698  9.625782  9.616041  9.526300
## 2013  9.069272  9.070443  8.991614  9.003698  8.925782  8.916041  8.926300
## 2014  7.969272  7.770443  7.591614  7.503698  7.325782  7.116041  6.926300
## 2015  6.169272  6.070443  5.991614  5.903698  5.725782  5.616041  5.426300
## 2016  5.169272  5.170443  5.191614  5.103698  5.125782  5.216041  5.226300
## 2017  4.669272  4.570443  4.491614  4.503698  4.525782  4.616041  4.626300
## 2018  4.269272  4.170443  4.191614  4.103698  4.125782  4.016041  4.026300
## 2019  4.269272  4.170443  4.091614  4.003698  3.925782  3.816041  3.826300
##            Aug       Sep       Oct       Nov       Dec
## 2010  9.414213  9.422126  9.299997  9.187868  9.072647
## 2011  9.214213  9.322126  9.399997  9.387868  9.472647
## 2012  9.414213  9.322126  9.199997  9.187868  9.172647
## 2013  8.914213  8.722126  8.599997  8.387868  8.172647
## 2014  6.814213  6.622126  6.499997  6.387868  6.272647
## 2015  5.214213  5.222126  5.099997  5.187868  5.172647
## 2016  5.314213  5.222126  5.199997  4.987868  4.872647
## 2017  4.614213  4.522126  4.499997  4.387868  4.272647
## 2018  4.014213  4.122126  4.199997  4.287868  4.272647
## 2019  3.814213  3.822126  3.799997  3.887868  3.972647
# Plot a line on the graph
plot(unemployed_ts)
lines(seasadj(stl_decomp), col="Red")

After removing seasonality, the underlying downward trend becomes clearer, and seasonal fluctuations are moderate but noticeable.

4.2 Naive Model

# Model output
forecast_naive <- naive(unemployed_ts,12)
plot(forecast_naive, col= 'red')

  • The naïve model simply uses the last observed value of the time series as the forecast for all future periods.
  • The shaded region represents the forecast intervals, showing increasing uncertainty over time.

4.2.1 Residual Analysis

nv_residual <- residuals(forecast_naive)
plot(nv_residual, main="Residuals from Naïve Forecast", ylab="Residuals", xlab = "Year")
abline(h=0, col="red")

The residuals from the naïve model show clear, repeated patterns rather than random noise, indicating the model is not capturing the underlying structure of the data. This happens because the naïve forecast ignores both trend and seasonality, leading to systematic over- and under-prediction.

hist(nv_residual, main="Histogram of Naïve Forecast Residuals",
     xlab="Residuals", col="lightblue", border="white")

The residuals are centered near zero, but the distribution isn’t symmetric.
That indicats the Naïve model doesn’t fit the data perfectly — some structure like seasonality is still left in the residuals.

plot(as.numeric(fitted(forecast_naive)), as.numeric(nv_residual),
     main = "Residuals vs Fitted (Naïve Model)",
     xlab = "fitted values", ylab = "Residuals")
abline(h = 0, col = "red")

The residuals vs. fitted plot shows clear horizontal banding, meaning the errors cluster at specific levels instead of being randomly scattered. This confirms the naïve model fails to capture the structure of the data and leaves strong patterns in the residuals.

plot(as.numeric(unemployed_ts),as.numeric(nv_residual),
     main = "Actual Values vs Residuals (Naïve Model)",
     xlab = "Actual Values", ylab = "Residuals",
     type = "p", pch = 19, col = "darkblue", cex= 0.8)

abline(h = 0, col = "red")

The actual-vs-residuals plot again shows clear horizontal bands, meaning the residuals take only a few repeated values instead of forming a random cloud. This confirms that the naïve model leaves strong structure unexplained and is not capturing the true dynamics of the series.

Acf(nv_residual, main="ACF of Residuals - Naïve Forecast")

The ACF of the naïve model’s residuals shows strong autocorrelation at many lags, especially early lags, which stay well above the significance bounds. This indicates that the naïve model leaves substantial structure unmodeled, and the residuals are not white noise, confirming poor fit.

4.2.2 Accuracy & Conclusion

accuracy(forecast_naive)
##                      ME      RMSE        MAE       MPE     MAPE      MASE
## Training set -0.0512605 0.1016668 0.07815126 -0.792328 1.256982 0.1079327
##                   ACF1
## Training set 0.5610783
forecast_naive$mean
##      Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 2020   4   4   4   4   4   4   4   4   4   4   4   4
autoplot(forecast_naive) +
  ggtitle("Naïve Forecast") +
  ylab("Unemployment(Percentage)") +
  xlab("Year") +
  theme_minimal()

Accuracy: The naïve model performs poorly — its MAPE is about 1.26, meaning it has large relative errors, and the ACF1 value of 0.56 shows strong autocorrelation in residuals (a bad sign). This indicates the model does not capture trend or seasonality.

What it predicts: The naïve model simply predicts that next month’s unemployment rate will be equal to the current month’s rate, offering no meaningful insight beyond repeating the last observed value.

4.3 Moving Averages

# Compute moving averages of different orders
MA3_forecast <- ma(unemployed_ts, order = 3)
MA6_forecast <- ma(unemployed_ts, order = 6)
MA9_forecast <- ma(unemployed_ts, order = 9)

# Plot original time series
plot(unemployed_ts, main = "Simple Moving Averages (Orders 3, 6, 9)",
     ylab = "Tourist Arrivals (Thousands)", xlab = "Year", col = "black")

# Add the three moving averages
lines(MA3_forecast, col = "red", lwd = 2)
lines(MA6_forecast, col = "blue", lwd = 2)
lines(MA9_forecast, col = "green", lwd = 2)

# Forecast next 4 quarters using order 6 (bonus)
fit_ma6 <- auto.arima(na.omit(MA6_forecast))
forecast_ma6 <- forecast(fit_ma6, h = 12)
lines(forecast_ma6$mean, col = "purple", lwd = 2)

legend("bottomleft",
       legend = c("Original", "MA(3)", "MA(6)", "MA(9)", "Forecast MA(6)"),
       col = c("black", "red", "blue", "green", "purple"),
       lwd = 2, 
       bty = "n",
       cex = 0.6,
       inset = 0.02)

Observations: - As the moving average order increases, the series becomes progressively smoother.
- The MA(3) line (red) still follows most of the short-term fluctuations.
- The MA(6) line (blue) reduces seasonal noise and captures medium-term trends better.
- The MA(9) line (green) is the smoothest, showing the long-term movement but lagging behind actual changes.
- Using MA(6), the forecast for the next next 12 month predicts stable arrivals with mild variations.

Overall, increasing the order of the moving average reduces short-term noise but delays response to sudden changes in the data.

4.3 Simple Smoothing

ss_forecast <- ses(unemployed_ts, h = 12)
plot(ss_forecast)

ss_forecast$model
## Simple exponential smoothing 
## 
## Call:
## ses(y = unemployed_ts, h = 12)
## 
##   Smoothing parameters:
##     alpha = 0.9999 
## 
##   Initial states:
##     l = 10.1132 
## 
##   sigma:  0.1021
## 
##      AIC     AICc      BIC 
## 30.87491 31.08180 39.23738

The SES model behaves almost like a naïve model because α is extremely close to 1, meaning it heavily tracks the latest value and does not smooth much of the past. This makes SES less suitable for your data, which has a clear long-term downward trend and seasonal structure.

4.3.1 Residual Analysis

ss_residual <- residuals(ss_forecast)
plot(ss_residual, main="Residuals from Simple Smoothing Forecast", ylab="Residuals", xlab = "Year")
abline(h=0, col="red")

The SES residuals show clear patterns rather than random noise, indicating the model fails to capture important structure in the data. The residuals also exhibit clustered positive and negative runs, confirming SES is not fitting the trend or seasonality well.

hist(ss_residual, main = "Histogram of SS model Residual", xlab="Residuals", col="lightblue", border="white")

Observations: - The SES residual histogram is roughly centered around zero, showing no clear systematic bias. - However, the distribution is skewed and not symmetric, confirming that SES does not fully capture the structure in the data.

plot(as.numeric(fitted(ss_forecast)), as.numeric(ss_residual),
     main="Fitted Values vs Residuals (SS Model)",
     xlab="Fitted Values", ylab="Residuals")
abline(h=0, col="red")

Observations: - The residuals vs. fitted plot shows that the points form clear horizontal bands rather than a random cloud. - This indicates systematic patterns remain, meaning the SES model still fails to capture important structure in the data.

plot(as.numeric(unemployed_ts), as.numeric(ss_residual),
     main = "Actual Values vs Residuals (SS Model)",
     xlab = "Actual Values", ylab = "Residuals",
     type = "p", pch = 19, col = "darkblue", cex= 0.8)
abline(h = 0, col = "red")

The residuals vs. actual values plot shows clear horizontal clustering, meaning the SES model produces repeated residual patterns rather than random noise. This indicates the model still fails to capture key structure in the data, so the fit remains inadequate.

Acf(ss_residual, main="ACF of Residuals - SS Forecast")

The ACF plot shows strong positive autocorrelation at many early lags, meaning the residuals are not behaving like white noise. This confirms that the SES model misses important structure in the data and is not an adequate forecasting model.

4.3.2 Accuracy & summary

accuracy(ss_forecast)
##                       ME      RMSE        MAE        MPE     MAPE      MASE
## Training set -0.05094848 0.1012563 0.07761731 -0.7868965 1.247713 0.1071953
##                   ACF1
## Training set 0.5583315
ss_forecast
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 2020        3.99999 3.869130 4.130850 3.799857 4.200123
## Feb 2020        3.99999 3.814935 4.185045 3.716973 4.283007
## Mar 2020        3.99999 3.773349 4.226631 3.653372 4.346608
## Apr 2020        3.99999 3.738289 4.261691 3.599753 4.400227
## May 2020        3.99999 3.707401 4.292579 3.552514 4.447466
## Jun 2020        3.99999 3.679476 4.320504 3.509806 4.490174
## Jul 2020        3.99999 3.653796 4.346184 3.470532 4.529448
## Aug 2020        3.99999 3.629894 4.370086 3.433977 4.566003
## Sep 2020        3.99999 3.607444 4.392536 3.399643 4.600337
## Oct 2020        3.99999 3.586211 4.413769 3.367169 4.632811
## Nov 2020        3.99999 3.566015 4.433965 3.336283 4.663697
## Dec 2020        3.99999 3.546718 4.453262 3.306771 4.693209
plot(ss_forecast)

  • The SES model performs very well, with a MAPE of about 1.25%, showing highly accurate short-term predictions. It forecasts the series to stay near 3.99999 over the next year, reflecting a stable level with no trend or seasonality. The positive ACF1 indicates some remaining structure that SES does not fully capture.

4.4 Holt-Winters

4.4.1 Holt Winters Model perform

hw_forecast <- hw(unemployed_ts)
forecast_hw <- forecast(hw_forecast, h = 12)
forecast_hw$model
## Holt-Winters' additive method 
## 
## Call:
## hw(y = unemployed_ts)
## 
##   Smoothing parameters:
##     alpha = 0.7758 
##     beta  = 0.164 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 10.258 
##     b = -0.109 
##     s = 0.0462 0.0217 0.0216 0.008 0.0077 -0.0113
##            -0.0277 -0.0421 -0.0215 -0.0208 0.0066 0.0116
## 
##   sigma:  0.0898
## 
##      AIC     AICc      BIC 
## 12.94893 18.94893 60.33629
  • α (alpha) = 0.7758 → The model places substantial weight on recent observations when updating the level.
  • β (beta) ≈ 0.164 → The trend component is updated gradually, indicating a relatively stable downward trend.
  • γ (gamma) ≈ 1e-04 → The seasonal pattern is very stable over time, with almost no change from year to year.
  • σ (sigma) = 0.0898 → Indicates relatively low residual variability.

4.4.2 Residual Analysis — Holt–Winters Model

hw_residual <- residuals(forecast_hw)

#plot residual
plot(hw_residual, main = "Residuals from Holt Winter Model", ylab="Residuals", xlab = "Year")
abline(h=0, col="red")

  • The residuals are centered around zero, indicating that the model has no systematic over- or underestimation bias.
  • Most residuals within the range of ±0.2, suggesting stable forecast performance.
  • A few larger deviations (e.g., around 2017 and 2015) indicate that the model did not fully capture certain short-term irregular shocks.
  • There is no visible repeating pattern or autocorrelation, implying that trend and seasonality have been effectively modeled.
#histogram
hist(hw_residual, main = "Histogram of Holt Winter model Residual", xlab="Residuals", col="lightblue", border="white")

  • The residuals are mostly concentrated around zero, indicating that the model has captured the main trend and seasonality effectively.
  • The distribution is slightly right-skewed, with a few larger negative residuals (around −0.2), suggesting that the model slightly underpredicted some observations.
  • Most errors fall within the range of −0.2 to +0.2, implying limited variance and stable forecast performance.

The residual histogram is approximately centered near zero with mild right skewness, confirming that the Holt–Winters model fits the data well overall but slightly underestimates a few low seasonal points.

# fitted fitted vs residuals
plot(as.numeric(fitted(forecast_hw)), as.numeric(hw_residual),
     main="Fitted vs Residuals (Holt Winter Model)",
     xlab="Fitted Values", ylab="Residuals")
abline(h=0, col="red")

Residuals are mostly concentrated within ±0.2, indicating that the model provides a close fit to the observed data. Only a few larger residuals appear when fitted values fall between 5 and 6, but no clear systematic pattern is present. This random scatter around zero suggests that the Holt–Winters model captures the main trend and seasonality effectively, with only minor localized errors.

# real values vs residuals
plot(as.numeric(unemployed_ts), as.numeric(hw_residual),
     main = "Actual Values vs Residuals (Holt Winter Model)",
     xlab = "Actual Values", ylab = "Residuals",
     type = "p", pch = 19, col = "darkblue", cex= 0.8)
abline(h = 0, col = "red")

Most points fall within ±0.2 residuals, indicating the model fits the data well. There are only a few (≈2) larger residuals when actual values are in the 5–6 range, but these are exceptions. Overall, the Holt–Winters model captures the bulk of the trend and seasonal structure with only minor localized errors.

#ACF plot of residuals
Acf(hw_residual, main="ACF of Residuals - Holt Winter Forecast")

ACF of Residuals

  • Most ACF spikes fall within the 95% confidence bands, indicating the Holt–Winters model captures most of the serial dependence in the data.
  • However, a few early-lag spikes still exceed the bands, showing that some structure remains unexplained and the residuals are not perfectly white noise.

4.4.3 Accuracy & Summary

accuracy(forecast_hw)
##                       ME       RMSE        MAE       MPE     MAPE       MASE
## Training set 0.006047645 0.08362116 0.06870748 0.1403796 1.187193 0.09489012
##                   ACF1
## Training set 0.5620648
forecast_hw
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 2020       3.949446 3.834332 4.064559 3.773395 4.125497
## Feb 2020       3.954453 3.796479 4.112427 3.712853 4.196053
## Mar 2020       3.937111 3.734376 4.139846 3.627054 4.247167
## Apr 2020       3.946437 3.696635 4.196239 3.564398 4.328476
## May 2020       3.935865 3.636587 4.235143 3.478159 4.393571
## Jun 2020       3.960216 3.609057 4.311375 3.423165 4.497267
## Jul 2020       3.986585 3.581183 4.391986 3.366577 4.606592
## Aug 2020       4.015636 3.553691 4.477580 3.309153 4.722119
## Sep 2020       4.025927 3.505203 4.546650 3.229549 4.822304
## Oct 2020       4.049626 3.467954 4.631299 3.160035 4.939218
## Nov 2020       4.059753 3.415023 4.704483 3.073724 5.045782
## Dec 2020       4.094166 3.384331 4.804001 3.008568 5.179765
plot(forecast_hw)

Model Accuracy The Holt–Winters additive model provides strong accuracy, with a very low MAPE of about 1.19%, indicating that forecast errors are small on average and the model captures both trend and seasonality well.

Forecast for the Next Year It predicts that unemployment will remain stable over the next year, with point forecasts mostly between 3.94 and 4.05 throughout 2020, showing a slight upward drift but no major changes.

Overall, the model fits the data well, handles seasonal patterns effectively, and produces smooth, realistic forecasts with narrow confidence intervals.

4.5 ARIMA

auto_fit <- auto.arima(unemployed_ts)
attributes(auto_fit)
## $names
##  [1] "coef"      "sigma2"    "var.coef"  "mask"      "loglik"    "aic"      
##  [7] "arma"      "residuals" "call"      "series"    "code"      "n.cond"   
## [13] "nobs"      "model"     "xreg"      "bic"       "aicc"      "x"        
## [19] "fitted"   
## 
## $class
## [1] "forecast_ARIMA" "ARIMA"          "Arima"
forecast_arima = forecast(auto_fit,h = 12,level=c(99.5))

forecast_arima$model
## Series: unemployed_ts 
## ARIMA(3,1,0)(2,0,0)[12] with drift 
## 
## Coefficients:
##          ar1     ar2      ar3     sar1     sar2    drift
##       0.3529  0.5959  -0.2016  -0.0679  -0.3012  -0.0507
## s.e.  0.0914  0.0792   0.0927   0.0951   0.0919   0.0158
## 
## sigma^2 = 0.003579:  log likelihood = 167.65
## AIC=-321.3   AICc=-320.29   BIC=-301.85

The automatic ARIMA procedure selected an ARIMA(3,1,0)(2,0,0)12 with drift, meaning the model captures short-term AR structure, a seasonal AR component, and an overall downward trend through differencing and drift. The estimated parameters (ar1 = 0.3529, ar2 = 0.5959, ar3 = –0.2016, sar1 = –0.0679, sar2 = –0.0312) and the small estimated variance (σ² ≈ 0.0036) indicate a stable model fit.

Overall, the model is well-specified for capturing both trend and annual seasonality, and it provides a strong statistical foundation for producing year-ahead forecasts.

4.5.1 Residual

arima_residual <- residuals(forecast_arima)
plot(arima_residual, main = "Residual of ARIMA", xlab = "Year")
abline(h=0, col="red")

Your residuals fluctuate around 0, which indicates that the model does not systematically over- or under-predict. Residuals look like white noise: no visible trend or seasonality. There are a few big residual spikes, especially around late 2013 and late 2016, possible unexpected events there.

hist(arima_residual, main = "Histogram of ARIMA model Residual", xlab="Residuals", col="lightblue", border="white")

The shape is roughly centered around zero The distribution is fairly symmetric, although not perfectly. No extremely heavy tails or skewness.

plot(as.numeric(fitted(forecast_arima)), as.numeric(arima_residual),
     main = "Fitted and Residuals of ARIMA Model",
     xlab = "Fitted Values",
     ylab = "Residuals"
     )
abline(h=0, col= "red")

The residuals appear randomly scattered around zero with no clear pattern, suggesting the ARIMA model captures most of the underlying structure in the data. > However, slight clustering at higher fitted values indicates some remaining seasonality or nonlinear effects that the ARIMA model does not fully explain.

plot(as.numeric(unemployed_ts), as.numeric(arima_residual),
     main = "Real Value vs Residuals of ARIMA",
     xlab = "Real Values",
     ylab = "Residuals"
     )

abline(h = 0, col= "red")

The residuals show no clear pattern across the range of actual unemployment values, suggesting the ARIMA model does not systematically over- or under-predict at specific levels. A small cluster of points at higher actual values hints at slight remaining structure, but overall the residuals behave reasonably randomly around zero.

Acf(arima_residual)

All spikes fall inside the bands, no significant remaining autocorrelation. No repeating seasonal pattern and no obvous structure So overall, the ACF suggests that the ARIMA model has done a reasonable job: the residuals look mostly random, meaning the model has captured the trend and short-term correlation in the data.

4.5.2 Accuracy& Summary

accuracy(forecast_arima)
##                        ME       RMSE        MAE        MPE      MAPE       MASE
## Training set 0.0005022391 0.05805211 0.04700228 0.04148005 0.7929878 0.06491364
##                     ACF1
## Training set -0.02333479
forecast_arima
##          Point Forecast  Lo 99.5  Hi 99.5
## Jan 2020       4.051437 3.883511 4.219362
## Feb 2020       4.115145 3.832638 4.397651
## Mar 2020       4.130416 3.682063 4.578770
## Apr 2020       4.158380 3.562627 4.754134
## May 2020       4.144127 3.386519 4.901735
## Jun 2020       4.149031 3.243134 5.054928
## Jul 2020       4.109461 3.053290 5.165632
## Aug 2020       4.063127 2.867138 5.259116
## Sep 2020       3.981989 2.648658 5.315321
## Oct 2020       3.896693 2.434237 5.359150
## Nov 2020       3.801711 2.214022 5.389400
## Dec 2020       3.734294 2.027927 5.440660
plot(forecast_arima, main = "Unemployment Next Year from ARIMA")

forecast_arima_N2 = forecast(auto_fit,h = 24,level=c(99.5))
plot(forecast_arima_N2, main = "Unemployment Next Two Year from ARIMA")

  • ARIMA’s prediction error is about 0.79%, which is quite good for unemployment forecasting.
  • The model forecasts unemployment to stay stable around 3.7–4.2% throughout 2020. This reflects a continuation of the recent stable trend.

ARIMA achieves strong overall accuracy with low error rates (MAPE 0.79%), no bias, and better performance than the naive benchmark, indicating a well-fitted model.

5. Accuracy Summary

# Create accuracy comparison table
model_comparison <- rbind(
  naive = accuracy(forecast_naive),
  ss = accuracy(ss_forecast),
  ma = accuracy(forecast_ma6),
  hw = accuracy(forecast_hw),
  arima = accuracy(forecast_arima)
)

rownames(model_comparison) <- c( "Naive",  "SES", "Moving Average","Holt Winter", "ARIMA")
round(model_comparison[, c("RMSE","MAE","MAPE","MASE")], 3)
##                 RMSE   MAE  MAPE  MASE
## Naive          0.102 0.078 1.257 0.108
## SES            0.101 0.078 1.248 0.107
## Moving Average 0.007 0.006 0.099 0.008
## Holt Winter    0.084 0.069 1.187 0.095
## ARIMA          0.058 0.047 0.793 0.065

Best and Worst Models by Metric

Accuracy Measure Best Model Worst Model Interpretation
MAPE Moving Average Naïve MA performs best on a scaled basis, far outperforming all others.

6. Forcast without windows using ARIMA

auto_fit2 <- auto.arima(ts_unemployed)
attributes(auto_fit2)
## $names
##  [1] "coef"      "sigma2"    "var.coef"  "mask"      "loglik"    "aic"      
##  [7] "arma"      "residuals" "call"      "series"    "code"      "n.cond"   
## [13] "nobs"      "model"     "bic"       "aicc"      "x"         "fitted"   
## 
## $class
## [1] "forecast_ARIMA" "ARIMA"          "Arima"
forecast_arima2 = forecast(auto_fit2,h = 12,level=c(99.5))
plot(forecast_arima2, main = "Unemployment Next Year from ARIMA without windowing")

accuracy(forecast_arima2)
##                        ME       RMSE       MAE         MPE      MAPE       MASE
## Training set -0.002646581 0.06835575 0.0539397 -0.02653067 0.7628524 0.05421286
##                    ACF1
## Training set 0.01269474
forecast_arima2
##          Point Forecast  Lo 99.5  Hi 99.5
## Jan 2020       4.096669 3.902574 4.290764
## Feb 2020       4.229977 3.867452 4.592501
## Mar 2020       4.332123 3.738809 4.925437
## Apr 2020       4.433075 3.586449 5.279701
## May 2020       4.501659 3.383952 5.619367
## Jun 2020       4.578219 3.189472 5.966965
## Jul 2020       4.627073 2.966072 6.288073
## Aug 2020       4.653120 2.724142 6.582098
## Sep 2020       4.668446 2.471881 6.865011
## Oct 2020       4.675410 2.219469 7.131351
## Nov 2020       4.679289 1.969602 7.388976
## Dec 2020       4.690124 1.734497 7.645752
forecast_arima_N2_2 = forecast(auto_fit2,h = 24,level=c(99.5))
forecast_arima_N2_2
##          Point Forecast     Lo 99.5  Hi 99.5
## Jan 2020       4.096669  3.90257376 4.290764
## Feb 2020       4.229977  3.86745200 4.592501
## Mar 2020       4.332123  3.73880934 4.925437
## Apr 2020       4.433075  3.58644870 5.279701
## May 2020       4.501659  3.38395227 5.619367
## Jun 2020       4.578219  3.18947225 5.966965
## Jul 2020       4.627073  2.96607248 6.288073
## Aug 2020       4.653120  2.72414178 6.582098
## Sep 2020       4.668446  2.47188128 6.865011
## Oct 2020       4.675410  2.21946896 7.131351
## Nov 2020       4.679289  1.96960187 7.388976
## Dec 2020       4.690124  1.73449668 7.645752
## Jan 2021       4.697726  1.51239187 7.883060
## Feb 2021       4.728808  1.32769641 8.129920
## Mar 2021       4.759328  1.15626983 8.362385
## Apr 2021       4.781986  0.98931218 8.574660
## May 2021       4.800928  0.82682504 8.775031
## Jun 2021       4.821846  0.67576556 8.967926
## Jul 2021       4.828969  0.51816919 9.139770
## Aug 2021       4.827170  0.35867067 9.295669
## Sep 2021       4.827129  0.20537855 9.448879
## Oct 2021       4.827348  0.05826121 9.596435
## Nov 2021       4.815239 -0.09657328 9.727051
## Dec 2021       4.799516 -0.25034390 9.849376
plot(forecast_arima_N2_2, main = "Unemployment Next Two Year from ARIMA without windowing")

ARIMA Forecasting Using the Full Historical Series — Summary Analysis

  • Model used -ARIMA(3,1,0)(2,0,0)[12] with drift

    • Non-seasonal AR(3) and differencing of order 1
    • Seasonal AR(2) with yearly seasonality (period = 12)
    • A drift term, capturing long-term trend
    • This model explicitly picks up both trend + seasonal cycles across the entire 1970–2020 range.
  • MAPE ~ 0.76% → extremely accurate (sub-1% error is unusually strong for macro data)

  • Forecast Interpretation (Next 1–2 Years) > Next Year: 4.09->4.69 a slight increase > Next Two Year: 4.09 - 4.8 Signaling a continued but moderate upward trend.

Overall: The full-sample ARIMA model suggests unemployment has reached a cyclical bottom and is now expected to rise gradually over the next two years. However, although the full-sample ARIMA model is statistically accurate, it mixes multiple historical economic regimes and overweights old high-unemployment periods. This leads to biased and less realistic forecasts for today’s labor market. A recent data window better reflects current structural conditions and produces more relevant short-term predictions.

7. Summary Conclusion

Overall -In the most recent years, the unemployment rate becomes much more stable, fluctuating within a narrow band around 3.8–4.2% with very consistent seasonality. There is no strong upward or downward trend, and variability is lower than earlier decades, making the series highly predictable in the short term.

Based on the selected models and forecasts: - Next Year: Both Holt–Winters and ARIMA predict unemployment will remain stable around 3.8–4.2%, with only minor month-to-month fluctuations and no major upward or downward shifts. - Next Two Years: The models collectively indicate continued stability, with values likely holding in the 3.7–4.3% range, suggesting no significant structural changes in the unemployment trend.

Rank Model 1. Moving Average- Best accuracy (lowest MAPE). 2. ARIMA – Highest accuracy, balanced fit, reliable residuals 3. Holt–Winters Additive – Strong seasonal modeling, good accuracy 4. Simple Exponential Smoothing (SES) – Reasonable but lacks seasonal structure 5. Naïve Model – Weakest; fails to capture trend or seasonality